19 research outputs found

    A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems

    Full text link
    Supercomputing systems today often come in the form of large numbers of commodity systems linked together into a computing cluster. These systems, like any distributed system, can have large numbers of independent hardware components cooperating or collaborating on a computation. Unfortunately, any of this vast number of components can fail at any time, resulting in potentially erroneous output. In order to improve the robustness of supercomputing applications in the presence of failures, many techniques have been developed to provide resilience to these kinds of system faults. This survey provides an overview of these various fault-tolerance techniques.Comment: 11 page

    A Survey of Distributed Intrusion Detection Approaches

    Full text link
    Distributed intrustion detection systems detect attacks on computer systems by analyzing data aggregated from distributed sources. The distributed nature of the data sources allows patterns in the data to be seen that might not be detectable if each of the sources were examined individually. This paper describes the various approaches that have been developed to share and analyze data in such systems, and discusses some issues that must be addressed before fully decentralized distributed intrusion detection systems can be made viable

    A Distributed Economics-based Infrastructure for Utility Computing

    Full text link
    Existing attempts at utility computing revolve around two approaches. The first consists of proprietary solutions involving renting time on dedicated utility computing machines. The second requires the use of heavy, monolithic applications that are difficult to deploy, maintain, and use. We propose a distributed, community-oriented approach to utility computing. Our approach provides an infrastructure built on Web Services in which modular components are combined to create a seemingly simple, yet powerful system. The community-oriented nature generates an economic environment which results in fair transactions between consumers and providers of computing cycles while simultaneously encouraging improvements in the infrastructure of the computational grid itself.Comment: 8 pages, 1 figur

    ContagAlert: Using Contagion Theory for Adaptive, Distributed Alert Propagation

    Get PDF
    The widespread uses of large-scale distributed systems, e.g., Grid networks and distributed storage systems, raise the possibilities of large-scale attacks on such systems. Although current technology for detecting worms and viruses in the Internet can be applied, few existing systems support fast propagation of alerts during the attack itself. This paper proposes and studies a new system towards this problem. The system, called "ContagAlert", uses contagion spreading behavior, somewhat like the spread of fads in society, in order to spread alerts. ContagAlert is able to propagate an alert while the attack is in progress, while at the same time suppressing disruptive signals generated by adversaries or false positives. The core contagion protocols in the system are completely localized, involving simple threshold checks at each node, but resulting in desired emergent threshold behavior at the network scale. Signals with too few sources fail to spread, and signals exceeding the threshold propagate across the entire network. Contagion protocols can be analyzed using bootstrap percolation models. We also present experimental results from contagion protocols running in a wide variety of topologies. Finally, we present experiments based on two real-life applications: Internet worm attacks and DoS attacks on p2p systems

    VisFlowConnect: NetFlow Visualizations of Link Relationships for Security Situational Awareness

    No full text
    We present a visualization design to enhance the ability of an administrator to detect and investigate anomalous tra#c between a local network and external domains. Central to the design is a parallel axes view which displays NetFlow records as links between two machines or domains while employing a variety of visual cues to assist the user. We describe several filtering options that can be employed to hide uninteresting or innocuous tra#c such that the user can focus his or her attention on the more unusual network flows

    Detection of Privilege Escalation for Linux Cluster Security

    No full text
    Cluster computing systems can be among the most valuable resources owned by an organization

    Cluster Security Research Challenges

    No full text
    In this paper, we share insights from our group experience building and experimenting on high performance computing clusters to support our research developing novel cluster security protection techniques and tools
    corecore